ggml-cpu : split arch-specific implementations #13892

xctan · 2025-05-29T13:38:45Z

This PR reworks the arch-specific code organization, following the discussion in #13720. Key changes include splitting the former ggml-cpu-quants.c and ggml-cpu-aarch64.cpp into several more focused files. Additionally, aarch64-related naming and identifiers, originally specific to AArch64 but now applicable to other architectures, have been updated to use repack to improve clarity and avoid confusion.

xctan · 2025-05-29T13:51:12Z

I'll need a bit more time to get this PR rebased onto the latest master branch.

xctan · 2025-05-31T06:29:59Z

@ggerganov @slaren
Now that all CI checks have passed, could you please review this PR?

ggerganov · 2025-06-04T12:29:19Z

ggml/src/ggml-cpu/ggml-cpu-impl.h

+
+#define GGML_DO_PRAGMA_(x) _Pragma (#x)
+#define GGML_DO_PRAGMA(x) GGML_DO_PRAGMA_(x)
+#if defined(GGML_CPU_GENERIC) || defined(__HIPCC__)
+// weak alias not working
+# define GGML_WEAK_ALIAS(name, alias)
+#elif defined(__GNUC__)
+// GCC/Clang on *nix
+# define GGML_WEAK_ALIAS(name, alias) GGML_DO_PRAGMA(weak name = alias) // NOLINT
+#elif defined(_MSC_VER) && defined (_WIN64)
+// MSVC
+// Note: C name mangling varies across different calling conventions
+// see https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170
+# define GGML_WEAK_ALIAS(name, alias) GGML_DO_PRAGMA(comment(linker, "/alternatename:" #name "=" #alias))
+#else
+# error "Unsupported compiler for GGML_WEAK_ALIAS"
+#endif
+
+#define GGML_CPU_NATIVE_IMPL(name) GGML_WEAK_ALIAS(name, name ## _generic)


"generic" implementations such as ggml_vec_dot_q4_0_q8_0_generic are always defined, regardless of the build configuration. And by making these definitions "weak", we allow them to be overwritten by arch-specific implementation.

Is my understanding of this logic correct?

Yes, on supported targets. The linker will first attempt to resolve implemented arch-specific symbols (like ggml_vec_dot_q4_0_q8_0). If an optimized version is not found, it automatically falls back to the _generic variants. I believe this approach offers improved maintainability at the cost of a slight increase in binary size due to these shadowed fallback implementations.

ggerganov · 2025-06-04T12:45:23Z

ggml/src/ggml-cpu/arch/x86/quants.c

+#endif
+    for (; ib < nb; ++ib) {
+        int sumi0 = 0;
+        int sumi1 = 0;
+
+        for (int j = 0; j < qk/2; ++j) {
+            const int v0 = (x[ib].qs[j] & 0x0F) - 8;
+            const int v1 = (x[ib].qs[j] >>   4) - 8;
+
+            sumi0 += (v0 * y[ib].qs[j]);
+            sumi1 += (v1 * y[ib].qs[j + qk/2]);
+        }
+
+        int sumi = sumi0 + sumi1;
+        sumf += sumi*GGML_FP16_TO_FP32(x[ib].d)*GGML_FP16_TO_FP32(y[ib].d);
+    }
+
+    *s = sumf;
+}


Currently, each arch re-implements the generic version. For this big refactoring this is probably the correct approach in order to minimize the risk of introducing bugs. But after we merge, we should consider ways to de-duplicate the scalar implementations by reusing the generic calls?

That should be feasible, though it's not as straightforward for tail elements like the ones in this function. Let's keep the deduplication work for another PR to make future bisecting easier.

ggml/src/ggml-cpu/quants.c

ggml/src/ggml-cpu/arch/arm/quants.c

ggerganov · 2025-06-05T10:25:58Z

Noting that #12995, #13966 and #13996 would need to be reapplied - sorry about the conflict, I should have been more careful and wait before merging these. Let's first finish the review and then resolve these.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

xctan force-pushed the split-arch branch from 4e0db43 to 2599c59 Compare May 29, 2025 13:59

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 29, 2025

xctan force-pushed the split-arch branch from 272b86e to 87ed654 Compare May 30, 2025 19:30

xctan marked this pull request as ready for review May 30, 2025 21:06

ggerganov self-requested a review June 4, 2025 12:19

ggerganov reviewed Jun 4, 2025

View reviewed changes

ggml/src/ggml-cpu/quants.c Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/quants.c Outdated Show resolved Hide resolved

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jun 4, 2025

ggerganov reviewed Jun 5, 2025

View reviewed changes

ggml/src/ggml-cpu/arch/arm/quants.c Show resolved Hide resolved

xctan added 16 commits June 6, 2025 00:34

move ggml-cpu-aarch64 to repack

6814bd4

split quantize_row_q8_0/1

a07340a

split helper functions

82d7410

split ggml_vec_dot_q4_0_q8_0

ead5762

split ggml_vec_dot_q4_1_q8_1

627e1ec

split ggml_vec_dot_q5_0_q8_0

9582518

split ggml_vec_dot_q5_1_q8_1

beca219

split ggml_vec_dot_q8_0_q8_0

a32715a

split ggml_vec_dot_tq1_0_q8_K

a46eca7

split ggml_vec_dot_tq2_0_q8_K

96a7f51

split ggml_vec_dot_q2_K_q8_K

5f881c9

split ggml_vec_dot_q3_K_q8_K

91fbf27

split ggml_vec_dot_q4_K_q8_K

58b6c62

split ggml_vec_dot_q5_K_q8_K

6272e0c

split ggml_vec_dot_q6_K_q8_K

7c7223f

split ggml_vec_dot_iq2_xxs_q8_K

9671c0e

xctan and others added 29 commits June 6, 2025 00:34

rename ggml-cpu-traits

6df3dd5

rename arm folder

3566ee8

move cpu-feats-x86.cpp

f40ad8c

rename ggml-cpu-hbm

1ac2d5e

update arm detection macro in quants.c

321b3ac

move iq quant tables

7b5bf50

split ggml_quantize_mat_q8_0/K

bf3dbea

split ggml_gemv_*

868c895

split ggml_gemm_*

6a2ba77

rename namespace aarch64 to repack

72ddf5a

use weak aliases to replace test macros

ad52349

rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

62dc3fd

rename more aarch64 to repack

46b1e49

clean up rebase leftover

5601df6

fix compilation errors

827aec0

remove trailing spaces

58210b8

try to fix clang compilation errors

2739f4c

try to fix clang compilation errors again

8713f87

try to fix clang compilation errors, 3rd attempt

df27810

try to fix clang compilation errors, 4th attempt

553d8ca

try to fix clang compilation errors, 5th attempt

9bfcd7e

try to fix clang compilation errors, 6th attempt

08ebdd9

try to fix clang compilation errors, 7th attempt

67eceec

try to fix clang compilation errors, 8th attempt

01a1c5c

try to fix clang compilation errors, 9th attempt

bef5b8d

more cleanup

47701d5

fix compilation errors

e5b6fdb

fix apple targets

2573662

fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

93e6718

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

xctan force-pushed the split-arch branch from 9fa5247 to 93e6718 Compare June 5, 2025 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu : split arch-specific implementations #13892

ggml-cpu : split arch-specific implementations #13892

Uh oh!

xctan commented May 29, 2025

Uh oh!

xctan commented May 29, 2025

Uh oh!

xctan commented May 31, 2025

Uh oh!

ggerganov Jun 4, 2025

Uh oh!

xctan Jun 4, 2025

Uh oh!

ggerganov Jun 4, 2025

Uh oh!

xctan Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 5, 2025

Uh oh!

Uh oh!

ggml-cpu : split arch-specific implementations #13892

Are you sure you want to change the base?

ggml-cpu : split arch-specific implementations #13892

Uh oh!

Conversation

xctan commented May 29, 2025

Uh oh!

xctan commented May 29, 2025

Uh oh!

xctan commented May 31, 2025

Uh oh!

ggerganov Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

xctan Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

xctan Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 5, 2025

Uh oh!

Uh oh!